Goto

Collaborating Authors

 price auction


Incrementality Bidding via Reinforcement Learning under Mixed and Delayed Rewards

Neural Information Processing Systems

Incrementality, which measures the causal effect of showing an ad to a potential customer (e.g. a user in an internet platform) versus not, is a central object for advertisers in online advertising platforms. This paper investigates the problem of how an advertiser can learn to optimize the bidding sequence in an online manner without knowing the incrementality parameters in advance. We formulate the offline version of this problem as a specially structured episodic Markov Decision Process (MDP) and then, for its online learning counterpart, propose a novel reinforcement learning (RL) algorithm with regret at most eO(H2 T), which depends on the number of rounds H and number of episodes T, but does not depend on the number of actions (i.e., possible bids). A fundamental difference between our learning problem from standard RL problems is that the realized reward feedback from conversion incrementality is mixed and delayed. To handle this difficulty we propose and analyze a novel pairwise moment-matching algorithm to learn the conversion incrementality, which we believe is of independent interest.







Comparing Uniform Price and Discriminatory Multi-Unit Auctions through Regret Minimization

arXiv.org Machine Learning

Repeated multi-unit auctions, where a seller allocates multiple identical items over many rounds, are common mechanisms in electricity markets and treasury auctions. We compare the two predominant formats: uniform-price and discriminatory auctions, focusing on the perspective of a single bidder learning to bid against stochastic adversaries. We characterize the learning difficulty in each format, showing that the regret scales similarly for both auction formats under both full-information and bandit feedback, as $\tildeฮ˜ ( \sqrt{T} )$ and $\tildeฮ˜ ( T^{2/3} )$, respectively. However, analysis beyond worst-case regret reveals structural differences: uniform-price auctions may admit faster learning rates, with regret scaling as $\tildeฮ˜ ( \sqrt{T} )$ in settings where discriminatory auctions remain at $\tildeฮ˜ ( T^{2/3} )$. Finally, we provide a specific analysis for auctions in which the other participants are symmetric and have unit-demand, and show that in these instances, a similar regret rate separation appears.


Improved learning rates in multi-unit uniform price auctions Marius Potfer 1,2 Dorian Baudry 3 Hugo Richard

Neural Information Processing Systems

Motivated by the strategic participation of electricity producers in electricity day-ahead market, we study the problem of online learning in repeated multi-unit uniform price auctions focusing on the adversarial opposing bid setting. The main contribution of this paper is the introduction of a new modeling of the bid space.


Learning and Collusion in Multi-unit Auctions

Neural Information Processing Systems

In a carbon auction, licenses for CO2 emissions are allocated among multiple interested players. Inspired by this setting, we consider repeated multi-unit auctions with uniform pricing, which are widely used in practice. Our contribution is to analyze these auctions in both the offline and online settings, by designing efficient bidding algorithms with low regret and giving regret lower bounds. We also analyze the quality of the equilibria in two main variants of the auction, finding that one variant is susceptible to collusion among the bidders while the other is not.